Method for enhancing recognition probability in voice recognition systems

ABSTRACT

The invention relates to a method for enhancing recognition probability in voice recognition systems. According to the inventive method, selective post-training of the already stored homonymic term is carried out after inputting a term to be recognized. This makes it possible to improve the speaker-dependent recognition rate even in environments with prevailing acoustic interference.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is entitled to the benefit of and incorporates byreference in their entireties essential subject matter disclosed inInternational Application No. PCT/DE99/00137, filed on Jan. 20, 1999 andGerman Patent Application No. 19894047.4, filed on Feb. 3, 1998.

BACKGROUND OF THE INVENTION

The invention relates to a method for enhancing the recognitionprobability in voice recognition systems.

1. Field of the Invention

2. Description of the Related Art

U.S. Pat. No. 5,617,468 discloses a method for enhancing the recognitionprobability in voice recognition systems wherein, after input of a termto be recognized, a post-training of the previously stored homonymicterm by means of the input term is carried out. This method does notpermit reliable voice recognition in acoustically changing environments.

EP Patent No. 0 241 163 relates to a voice recognition method thatalerts the user if a term to be recognized is already stored in asimilar form in the voice recognition system and the two terms might beconfused. This method also does not offer a reliable recognition ofterms in changing acoustic environments.

BRIEF SUMMARY OF THE INVENTION

Voice recognition systems today are used primarily in computers,communication systems and other technical equipment where ease ofoperation or fast data input is important. The prior art systems,however, are not mature and are flawed in operation, particularly ifthey are operated in environments with acoustic interference. In thiscase, a word to be recognized is often misrecognized or not recognizedat all. As a result, the user must multiply repeat the word to berecognized, which causes unreasonable delays if recognition errors occurfrequently.

U.S. Pat. No. 5,617,468 discloses a method for enhancing the recognitionprobability in voice recognition systems wherein, after input of a termto be recognized, a post-training of the previously stored homonymicterm by means of the input term is carried out. This method does notpermit reliable voice recognition in acoustically changing environments.

EP-A-0 241 163 relates to a voice recognition method that alerts theuser if a term to be recognized is already stored in a similar form inthe voice recognition system and the two terms might be confused. Thismethod also does not offer a reliable recognition of terms in changingacoustic environments.

Thus, the object of the present invention is further to develop topropose a method for voice recognition that enhances thespeaker-dependent recognition rate in a user-friendly manner,particularly in environments with acoustic interference.

This object is attained by the characteristic features of claim 1.

Advantageous modifications and further developments of the invention areset forth in the dependent claims.

The invention proposes that in voice recognition systems a post-trainingof the voice patterns of newly input terms. rate, particularly inenvironments with acoustic interference, is enhanced.

The invention proposes that in voice recognition systems a post-trainingof the voice patterns of newly input terms, which may have beenmisrecognized or not recognized at all, be carried out. Post-trainingmeans that a misrecognized or non-recognized term is not simplyoverwritten after repeated input, but is compared and correlated withthe previously input terms or supplemented by a new pattern in order toreduce or mask out patterns or incidental noise that are unimportant forrecognizing the term. The invention is intended, in particular, for usein voice recognition systems that operate in environments with acousticinterference, e.g. in mobile radiocommunications terminals, telephones,etc.

If a term previously stored as a voice pattern is successfullyrecognized, the stored voice pattern is post-trained with the newlyrecorded pattern. In case of successful recognition of each previouslyrecorded term, this post-training is carried out n times, where thenumber of passes can be freely selected and changed at any time. Thispost-training makes it possible continuously to reduce the influence ofvariable incidental noise on the actual constant voice pattern of aword.

According to a further development of the invention, if the recognitionof a word is uncertain, the system prompts for a renewed input, which isused for post-training. Here, too, the maximum number of passes can befreely selected and changed at any time. If the comparison of an inputvoice pattern with stored voice patterns results in a similar and littledifferentiated recognition probability for several stored terms, thesystem plays these terms for the user and prompts the user to repeat theinitially input term. If recognition is then successful, post-trainingis interrupted. For reasons of clarity, it is preferred to limit thenumber of the possible terms that are output by the system in case ofuncertain recognition to a predefined number, and to limit therepetition of the process to, e.g. three direct repetitions.

Another further development provides that in case of a new input of aword or term in the system, the system carries out a comparison withpreviously stored terms after the new term has been input. The voicerecognition system is thus used to compare a new term with previouslystored terms and to determine whether the voice pattern of the new termis sufficiently distinct from the voice patterns of the previouslystored terms that a misrecognition or uncertain recognition is notexpected. However, if in the context of a correlation comparison basedon a defined criterion of uncertain recognition, the new voice patternresults in a strong similarity or probable match with previously storedvoice patterns, the system optically or acoustically informs the userand prompts him to repeat the input of the new term, which serves forpost-training. The number of consecutive repetions can be freelyselected and changed at any time.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The invention will now be described in greater detail by means ofseveral drawing figures. The drawing figures and the pertainingdescription will illustrate additional features of the invention. Thefollowing show:

FIG. 1: a schematic flowchart of the method according to the inventionbased on the example of a voice recognition system used in a mobiletelephone unit;

FIG. 2: a schematic flowchart of the method used in processing newinputs.

DETAILED DESCRIPTION OF THE INVENTION

The inventive method will now be described in greater detail by means ofthe drawing figures in connection with a mobile telephone unit. A mobiletelephone unit with voice recognition and voice output is assumed. Themobile telephone unit has a telephone directory with name entries, eachof which is associated with a corresponding dial number. Through voiceinput of a name stored in the telepnone directory, the user can triggera dial process or some other action.

According to FIG. 1, after successful recognition of a name previouslystored as a voice pattern in the telephone directory, the stored voicepattern is post-trained with the newly recorded one. If the recognitionof a name is uncertain, the system prompts for a renewed input, which isthen used for post-training.

The variable k or k_(max) initally defined in Step 1 describes thenumber, or the maximum number, of training passes.

After prompting for and input of the name according to Step 2, acomparison is first made in Step 3 to determine whether the maximumnumber of post-training passes has been reached. If this is true, theoperation is interrupted in Step 4. If the maximum number of passes hasnot been reached, the voice recognition system compares the name thathas been input with the names previously stored in the telephonedirectory in Step 5. If the name that has been input is unambiguouslyrecognized in Step 6, the action requested by the user is carried out inStep 7, e.g. a connection with the requested conversation partner is setup. Furthermore, in Step 8, a post-training of the correspondingtelephone directory entry is carried out with the name that was lastinput and recognized as correct. The action is terminated in Step 9.

If, however, the name that has been input is not unambiguouslyrecognized, the comparison of an input voice pattern with stored voicepatterns in Step 6 results in a similar and little differentiatedrecognition probability for several stored names, the system informs theuser and plays the name that is similar to the input name in Step 10. InStep 11 the user is prompted to repeat the initially input name in Step12. The counter k is increased by one in Step 13, and the routine jumpsback to step 3 and is repeated from this step.

According to FIG. 2, if a name is newly entered in the telephonedirectory, a comparison with the previously stored names is carried out.

First the loop counter k is set to zero in Step 20.

The user is then prompted to input the new name in Step 21.

In Step 22 the system then checks whether the number of passes hasexceeded the defined value. If this is true, the new name is stored inthe telephone directory in Step 25 and the process is terminated in Step26. If the counter is lower than the predefined value, the voicerecognition system compares the new name with previously stored names inStep 23. If according to Step 24, the voice pattern is sufficientlydistinct from the voice patterns of the previously stored names that amisrecognition or uncertain recognition is not expected, the new name isstored in the telephone directory in Step 25 and the process isterminated in Step 26. However, if within the context of a correlationcomparison based on the defined criterion of uncertain recognition, thenew voice pattern shows a strong similarity or probable match withpreviously stored names, the system optically or acoustically informsthe user in Step 27 and prompts him to repeat the input of the new namein Step 28. The new name can be reinput in Step 29. The repeated inputserves for post-training in Step 30. The number of consecutive repetionscan be freely selected and changed at any time. After each pass, thecounter is increased by one in Step 31, and the routine jumps back toStep 21.

What is claimed is:
 1. Method for enhancing the recognition probabilityof voice recognition systems wherein, after input of a term to berecognized, a post-training of the previously stored homonymic term iscarried out by means of the input term, characterized by: a) input ofthe term to be recognized, b) comparison of the input term with termspreviously stored in the voice recognition system, and c) if the termwas unambiguously recognized: 1) execution of the desired action, 2)post-training of the corresponding stored term in the voice recognitionsystem with the initially input term, and 3) termination of the process;d) if the comparison results in an uncertain recognition probability forseveral stored terms: 1) information of the system user and display orplaying of these terms which are similar to the input term, and 2)prompting of the system user to reinput the initially input term. 2.Method as claimed in claim 1, characterized in that the post-training isbased on a comparison through correlation of the term that has beeninput with the previously stored homonymic term, wherein each of thecharacteristic voice patterns is determined and stored.
 3. Method asclaimed in claim 1, characterized in that the number of displayed/playedterms is predefined.
 4. Method as claimed in any one of claims 1-3,characterized in that, if a term is newly input into the voicerecognition system, a comparison with the previously input terms iscarried out after the new term has been input in order to determinewhether the voice pattern of the new term is sufficiently distinct fromthe voice patterns of previously stored terms that no misrecognition oruncertain recognition is expected.
 5. Method as claimed in any one ofclaims 1-3, characterized in that, if the new term, in the context of acorrelation comparison based on a defined criterion, results in anuncertain recognition or a strong similarity or probable match withpreviously stored terms, the system optically or acoustically informsthe user and prompts him to repeat the input of the new term, whichserves for post-training.
 6. Method as claimed in any one of claims 1-3,characterized in that the number of post-training passes is preselected.